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FOREWORD 



This report contains the results of* the second 18 months (December 
15 3 1968 - June 30 5 1970) of effort toward developing an Information 
processing Laboratory for research and education in library science. 

The work was supported by a grant (OEG— 1—7^071085^^-286 ) from the Bureau 
of Research of the Office of Education, U.S, Department of Health, 

Education, and Welfare and also by the University of California* The 
principal investigator was M.E, Maron, Professor of Librarianship * 

This report is being issued as six separate volumes by the Institute 
of Library Research, University of California, Berkeley- They are: 

* Maron, M.E, and Don Sherman, et al. An Information Processing 

Laboratory for Education and Research in Library Science: Phase 2_ > 

Contents — -Introduction and Overview; Problems of Library 

Science; Facility Development; Operational Experience* 

* Mi gn on , Edmond and Irene L* Travis, LABSEARCHi 1LR Associative 
Search System Terminal. Users 1 Manual , 

Contents — Basic Operating Instructions; Commands; Scoring 

Measures of Association; Subject Authority List. 

* Meredith, Joseph C, Reference Search— Sy ata m ( RHTFSE A RCH ) Users’ Manual . 

Contents- — Rationale and Description; Definitions; Index and 

Coding Key; Retrieval Procedures; Examples. 

* Silver, Stephen S. and Joseph C, Meredith. DISCUS Interactive System 
Us e rs 1 Man.ua 1 , 

Contents — Basic On-Line Interchange ; DISCUS Operations; 

Programming in DISCUS; Concise DISCUS Specifications; 

System Author Mode; Exercises. 

* Smith, Stephen F. and William Harrelson, TMS: A Terminal Monitor 

Contents- — Part I: Users 1 Guide - A Guide to Writing Programs 

for TMS 

Part II: Internals Guide - A Program Logic Manual for 
the Terminal Monitor System 

■ Aiyer, Arjun K. The CIMARQN System: Modular Programs for the 

Organization and Search of Large Files . 

Contents — Data Base Selection; Entering Search Requests; Search 

Results; Record Retrieval Controls; Data Base Generation. 

Because of the joint support provided by the File Organisation Project 
(OEG- 1-7-071083-5068) for the development of DISCUS and of TMS, the volumes 
concerned with these programs are included as part of the final report for 
both projects. Also, the CIMAROR System, whose development was supported by 
the File Organization Project, has been incorporated into the Laboratory 
operation and therefore, in order to provide a balanced, view of the total 
facility obtained', that volume is included as part of this Laboratory project 
report, (See Shoffner, R.M, , et al, , The Organization and Search of 
Bibliographic Records in On-Line Computer Systems; Project Summary . ) 
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1. INTRODUCTION 



1 * 1 Overview 

LABSRC30 is a search program to teach and demonstrate automatic 
retrieval techniques . In response to requests In the form of Boolean 
expressions, LABSRC3C carries out a search which may be automatically 
expanded using terms associated with the original request terms. 

The degree of association of one term with another Is determined by 
using one of eight statistical association measures operating on 
index-term co-occurrence data. The program computes a probable 
relevance score for each document it retrieves . It operates inter- 
actively via Sanders video terminals and utilises a cathode ray 
tube (CRT) screen for all user and program displays. It offers a 
range of options for formulating search requests and for modifying 
search strategy. 

The materials for constructing the data base are 

1, A set of documents, hereafter called the corpus . 

The corpus consists of journal articles in the 
field of information science. 

2. A controlled vocabulary used for indexing the docu- 
ments called the Subject Authority List . 

Each journal article is assigned an accession number and indexed, A 
machine record for each document, consisting of its accession number 
and the index terms that have been applied to it, is generated* 

The file of all these index-term records is called the MASTERI file, 
and it is on this file that the searches are carried out. For an 
example of a record from the MASTERI file, see Fig* 2 on page 20 , 



An auxiliary file, called MA3TERA , contains the abstracts of 
the documents in the corpus and is stored in a separate area of 
disc storage. The MASTERA file is not used directly for searching, 
but abstracts may be readily requested on-line by the user and 
displayed on the Sanders video terminal* 

1.2 The Corpus 

The records in the MASTERI file describe a set of kOO articles 
on information science published since 1957 , and chosen chiefly from 
the primary research journals of ACM, ABIS, ASLIB, AFIPS, and (more 
selectively) other journals and symposia of similar stature and 
character* At the present time, the most recent articles in the 
corpus date from 1968, but further additions to this collection are 
to be made from time to time. 

The texts of the documents are not stored in the computer, but 
the documents themselves, in microfiche format, are kept in the 
Information Processing Laboratory and are easily accessible. The 
Laboratory also contains reading apparatus as well as a microfiche 






copier . 



1.3 The Indexing 

Index terms from the Subject Authority List have been manually 
assigned to documents in the corpus , with an average indexing depth 
of approximately 15:1. The index-terms are not all single-word 
express ions * Some of them are pre-eoordinated phrases, such as 
manual indexing or state-of-the-art ; thus technically speaking 
they are descriptors rather than index terms in the conventional 
sense , 

The Subject Authority List is included in Appendix 1. 

1,1 The Accession Numbers 

Each document is manually . assigned a five-character accession 
number, consisting of a letter followed by four digits. The letters 
used for these accession numbers are A, B, and X, There is no 
classif icatory significance to these letters - 

1.5 The Search Modes 

It is possible to specify either of two kinds of automatic 
searching procedures for the identification and retrieval of docu- 
ments in response to a search request: direct match or associative 

search . 

When the system is searching in direct match mode , it will 
retrieve only those documents whose index terms exactly match 
the specifications of the search request. Although such a search 
will be generally satisfactory in the sense that most or all of the 
documents retrieved will have a readily recognizable correspondence 
with the topics specified by the index terms in the search request, 
the search may nevertheless overlook some relevant documents be- 
cause of variations in indexing or search request formulation. 

In order to get around this difficulty, LABSRC3C offers a means 
for extending the search by putting the system into associative 
retrieval mode . In this procedure the system automatically ex- 
pands the request and searches not only for documents whose index 
terms match those of the request, but also for documents indexed 
under terms which have a high statistical association with the 
terms in the original request . This automatically elaborates the 
request in a direction which has high probability of .retrieving 
additional relevant documents , The method is not foolproof 
since it is based on probability rather than certainty, but 
the assumption is that the probability of retrieving additional 
relevant documents will be increased by adding associated terms 
to those specified in the request. 

1.6 The Association Files 

Two terms are said to be positively associated if they are used 




Table 1 



Comparison of the Interpretation of* Boolean Operators in Direct 



Match Mode and Associative Retrieval Mode 
( *A S and f B f stand for any two index terms) 



Request 



f A* and f B* 



’A* or f B' 



T 



f A* and npt 'B 1 



O 

ERIC 



Direct Match 



Document must 
he indexed 
under both A 
and B to he 
retrieved. 



Associative 



Documents must he 
indexed under one 
of the three fol- 
lowing combinations : 

(1) both A and B 9 

(2) A and a term 
highly associated 
with B , ( 3 ) a term 
highly associated 
with A and also one 
highly associated 
with B. 



Document must 
he indexed 
under either 
A or B . 



Do euma nt mus t 
be indexed under 
A* hut not in- 
dexed under B. 



±9 



Documents must he 
indexed under A 
or B or at least 
one term which 
is highly associ- 
ated with A or B, 



Documents must 
he indexed under 
A or at least 
one term highly 
associated with 
A j hut it must not 
be indexed under 
B , NOT operators 
are not expanded 
in associative 
mode. 



jointly in the indexing of documents more frequently than would be 
expected by random chance. An association measure is an algebraic 
formula for calculating and giving a numerical representation of 
the degree of association between a pair of index terms. The value 
obtained from this calculation is called an association value . 

Since the determination of this value depends on occurrences of 
single terms and co-occurrences of term pairs rather than the 
meanings of the terms , it is a statistical rather than a semantic 
measure of the closeness of the terms. 

There are mery different ways of calculating association, and 
LABSRC3C permits one to choose from a repertory of association 
measures ^ which differ from each other (in some cases quite strik- 
ingly) in their properties. The association measures may differ 
not only in the quantities that they compute for association values, 
but also in their determination of which terms are found to be most 
highly associated. Thus different measures may often lead to 
different retrieval results when searching in associative mode. 

The individual measures are described and compared in Chapter 5* 

For searching in associative mode, LABSRC3C contains a set 
of association files , one for each association measure. Each file 
is made up of as s oc i at i on t ab ie s , one table for each term in the 
Subject Authority List. Each table consists of an index term, 
the four other terms most highly associated with it, and their 
computed association values. The association value of the index 
term with itself under each formula is also computed and stored, 
therefore, there are five association values in all in each table. 

An example of an association table is given and discussed on page 21. 

1.7 The Search Request 

1,7 #1 The Boolean Expression 

LABSRC3C provides automatic retrieval of document accession 
numbers in response to requests submitted in the form of Boolean 
expressions. The components of a Boolean expression are "Boolean" 
operators and operands. The operands are* in this case, single 
index terms from the Subject Authority List or Boolean expressions 
combining such terms , Hie three logical or "Boolean" operators 
are AND, OR, and NOT* Since these operators differ considerably 
in their effect on the retrieval output, some care must be taken 
in their application. Compare the following results. * 

Ex, 1 . f AUTO * INDEXING 1 AND * MANUAL INDEXING 1 . This request 

will cause LABSRC3C to retrieve only those documents that are indexed 
under both of the terms ’AUTOMATIC INDEXING ’ and ’MANUAL INDEXING’, 

Ex, 2 , ’AUTO. INDEXING 1 OR ’MANUAL INDEXING’, In response to 
this request, the system will retrieve all documents indexed under 
at least one of these two terms , including documents indexed under 
both. 

% “ 

The interpretation of these examples assumes searching in Direct 
Match Mode. 
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Ex, 3. 'AUTO. INDEXING' AND NOT 'MANUAL INDEXING'. In this 
case the system -will retrieve only those documents indexed under 
'AUTO. INDEXING' that are not also indexed under 'MANUAL INDEXING'. 

The Boolean expression need not be limited to two index terms. 
It may be of considerable length and complexity, including paren- 
thetical expressions, 

Ex I* (('INFO. SCIENCE' OR 'CYBERNETICS') AND 'SOCIAL 

= ’ IMPLIC,') AND NOT ('CURRICULUM' OR 'EDUCATION') 

Interpretation: This request will cause the system to retrieve 

documents that are indexed under both 'INFO. SCIENCE' and 
'SOCIAL IMPLIC.' or both 'CYBERNETICS' and 'SOCIAL IMPLICATIONS', 
provided they are not also indexed under either 'CURRICULUM' or 
'EDUCATION' , 

The request would be completely changed if the parentheses 
were moved or removed. For example: 

'INFO. SCIENCE' OR ('CYBERNETICS' AND 'SOCIAL IMPLIC.') 

AND NOT 'CURRICULUM' OR 'EDUCATION' , 

Interpretation: This request will retrieve documents indexed under 

(l) both 'CYBERNETICS' and 'SOCIAL IMPLIC,' or else (2) 'INFO. 
SCIENCE' or else (3) 'EDUCATION' providing that none of them are 
indexed under 'CURRICULUM', 

The procedures for entering Boolean expressions are dis- 
cussed in Chapter 2. 

1,7.2 Weights 

Weights may be assigned to operands consisting of either single 
terms or parenthetical expressions or to individual terms within 
the operands. In general the weights reduce the value which the 
weighted term contributes to the relevance computation. The com- 
plexities of the effects of these weights on retrieval are discussed 
in Chapter 4, Procedures for inputting weights are discussed in 
Chapter 2, 

1 , 8 Output 



LABSRC3C responds to a request by searching the MASTERI file 
and producing a list of the accession numbers of the documents that 
satisfy the specifications of the request. If scoring, is requested, 
the program will compute and display a relevance score for each 
document retrieved. The relevance score represents the degree, 
to which the indexing of the retrieved document matches the terms 
of the request, A detailed explanation of how probable relevance 
scores are calculated is given in Chapter 4, 

Unless otherwise specified by the user, the retrieved docu- 
ments are listed in accession number order. The user, however, may 



have the program sort the output by relevance score in either ascending 
or descending order by using the SORTA and SORTD commands. (See 
Sec. 3.S+.2) 

1.9 Further Information 

For an amplified discussion of the rationale and workings of 
LABSRC3C, consult Chapter 5 M.E. Maron et al. , An Information 
Processing Laboratory f or Education and Rese arch in Library Science : 
Phase I (Berkeley: Institute of Library Research, July 1969). 

Copies of this report are available in the Library School Library, 
University of California, Berkeley. 
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2. BASIC OPERATING INSTRUCTIONS 



2.1 The Log-in. Procedure 

Since the log-in procedure is changed from time to time, it 
does not seem worthwhile to include it in this manual. Current 
instructions are usually available as hand-outs in the lab, and 
the lab supervisor can give on-the-spot help. 

2.2 The Normal Pass 

This section describes the six-step sequence of questions and 
responses that constitutes the normal program flow of LABSRC3C . A 
direct progression through the six-step program flow is called a 
normal pass . However, additional commands may also be entered 
during the sequence with the exception of the search step QOU, 
Entering a command modifies the normal program flow, and turns 
control of the program over to the user. This chapter will be 
limited to a description of the normal program flow. The commands 
will be discussed separately in Chapter 3- 

After the user requests LABSRC3C to be loaded, the main se- 
quence of the program begins , Six questions are displayed in 
separate sequences for user response, as follows: 

Q01 DO YOU WANT WORD ASSOCIATION? 

Q02 PLEASE SPECIFY ASSOCIATION FILE. 

Q03 DO YOU WANT SCORING? 

QO U ENTER BOOLEAN EXPRESSION. 

Q05 DO YOU WANT RESULTS DISPLAYED? 

Q06 SPECIFY RESTART OR EXIT. 

The answers to these questions specify the kind of search that 
will be carried out. After typing a reply to each question, hit 
SEND BLOCK. This transmits the response to the computer, which 
will process it and then go on to the next question. 

2.3 The Six Questions Individually Discussed 

Q01 DO YOU WANT WORD ASSOCIATION? 

(i.e. Do you want to search in associative mode?) 



Valid Responses: YES 

NO 

A NO response instructs the system to search only for those 
documents whose index terms exactly match the specifications of 
the search request, to be submitted in response to QOU. This 
form of search is called direct match mode . If the response to 
Q01 is NO, the program will skip QQ2 and go to QQ3. 




A YES response means that the system will search not only for 
documents whose index terms match those of your request, but also 
for documents indexed under terms , which, although not specified 
in the request, are highly associated (statistically) with the 
search request terms * This is called associative retrieval mode * 

A YES response will therefore lead the system to retrieve all 
of the documents that would be produced from a NO response, and 
hopefully some others as well. For further details about term 
Section 1*5 and Chapter 5* 

PLEASE SPECIFY ASSOCIATION FILE 

(i,e, Which association measure to you wish to 
use to expand your request?) 

DOYLE 
KUHNSG 
KUHNSGN 
KUHNSL 
KUHNSS 
KIJHNS SN 
KUHNSW 
KUHNSY 

An association file is a listing of each of the index terms 
in the Subject Authority List together with the four other index 
terms that are most closely associated (statistically) with it. 

The names of the different files correspond to different formulas 
for computing the closeness between pairs of index terms. The 
different methods or measures produce somewhat different results; 
therefore , the choice of association measure will affect both 
the quantity and the selection of the documents retrieved in 
response to your request* We do not yet fully understand how to 
choose from the repertory of* association measures , although some 
suggestions are offered in Chapter 5- If you are uncertain or 
indifferent as to choice of file, we suggest KUIINSL* 

QOS DO YOU WANT SCORING? 

(i,e* Do you want the program to display probable 
document relevance scores, when results are dis^ 
played? ) 

Valid Responses i YES 

NO 

When scoring is specified, LABSRC3C computes a relevance score 
in the range‘(0,l) for each document, which reflects the closeness 
between the input request and the document retrieved* Thus, the 
relevance score is the degree to which the document indexing 
matches the request specification or its expansion. For an 
explanation of how the relevance score is determined, see 
Chapter * 



association, see 
QQ2 

Valid Responses; 



If the response is MO, the system will not compute relevance 
scores , and the retrieval output will simply consist of an un— 
ranked set of document citations. If the search is being conducted 
in Direct Match Mode, i,e., the response to QQ1 was NO, then it 
is of no advantage to reply YES to Q03, as all documents retrieved 
via direct match will have the same relevance number. 

QOi+ ENTER BOOLEAN EXPRESSION 

Valid Responses: Any Boolean expression consisting of a string 

of terms from the Subject Authority List, con- 
nected by the operators AND, OR, or NOT. 

Remarks on Q04 

1, Each index term must be enclosed in single quotes; e.g,, 
'LANGUAGE' AND 'GRAMMAR' AND NOT 'SYNTAX' 

2. Parenthetic expressions may be used; e.g., 

('AUTO. INDEXING' OR 'MANUAL INDEXING') AND 
(('SYSTEM' OR 'RETRIEVAL ' ) AND 'EFFECTIVENESS') 

3. Weights in the range ( 0 , 1 ) may be applied to elements of 
your Boolean expression. The weight is a 3-digit decimal 
number and must be followed by an asterisk; e.g., 

» PERFORMAN CE 1 OR . 500*( 'PRECISION ' OR 'RECALL') 

Warning: There is no diagnostic for a missing asterisk, 

but if the asterisk is not present, the system 
will ignore the weight in searching. 

For a discussion of the effects of weights on retrieval, 
see Chapter 4, 

4, If the Boolean egression is longer than one line, 

LABSRC3C automatically concatenates the input. It 
will appear as if the last character of the line is 
dropped, but it will in fact be processed. 

5, The response to Q04 must contain at least one Boolean 
operator. Although LABSRC3C is not intended for 
searching single-term requests, such requests may be 
formulated in the term-operator-term pattern by using 
the term as both of the operands; e.g., 

'AUTOMATION' AND 'AUTOMATION'. 

6. Co mman ds may not be entered in response to Q04. 



Q05 



Q2k DOCUMENTS HAVE BEEN RETRIEVED 
DO YOU WANT RESULTS DISPLAYED? 



(i.e» 9 Twenty-four documents have been retrieved 
by "this request . Do you ¥ant to see accession 
numbers and relevance scores for these documents? 
Note: 02*4 is only an example. Any number of 

documents might have been retrieved . 



Valid Responses: YES 

NO 

A YES response will generate a display of the accession num- 
bers of the documents that satisfy the request. If the responses 
to b oth Q03 and Q0 5 were YES, then LAB3RC3C will also display the 
probable relevance score for each retrieved document as well as 
the accession number. The relevance score will appear to the 
right of the accession number on the output display, as shown in 
Fig, 1. 

If the response to QQ5 is NO, the program win continue to 

Q0 6 SPECIFY RESTART OR EXIT 

Valid Responses: RESTART 

EXIT 

A RESTART response returns you to Q01. 

EXIT causes LAB3RC3C to be unloaded and returns control to the 
Terminal Monitor System. At this point another program may 
be requested (included LABSRG3C), or the session may be terminated. 
To sign off, the procedure is 

hit CLEAR 
hit TYPE 
type LOGOUT 
hit SEND BLOCK 




PIG. 1 



Example of a LABSRC3C output display in response to a search 
request in Associative Retrieval Mode, with scoring requested. 
Eleven documents have been retrieved, and are displayed in 
accession number order, together with their relevance numbers. 
Thus, for example, the first document that satisfies the re- 
quest is A28, and its relevance number with respect to this 
request is .126. 




its 



3 , COMMANDS 



3 , l Introduction 



The normal program flow described in Chapter 2 provides a basic 
automatic retrieval facility's but LABSRC3C also contains a command 
language which considerably increases the system’s flexibility* 

These commands allow for altering the program flow, modifying 
requests , displaying selected portions of the list of retrieved 
documents, and displaying the MASTER! and MA8TERA files and the 
association tables. 

The components of a comaand in LABSRC3C are a verb and an ob- 
ject, although some commands use the verb alone. This verb-object 
combination is the basic pattern of the command. The program scans 
for this phrase in interpreting the input from the terminal * Other 
words may be added to the command if the user wants to enter a more 
complete English expression. For example: 

GET T B12l8 ■ 

(the basic pattern) or 

GET THE INDEX TERMS FOR DOCUMENT , Bl2l8 ! 

will both cause the index terms for document B 1218 to be retrieved, 
but use of the basic pattern minimizes the amount rf typing required 
to communicate with the system. 

Table 2 lists the basic patterns of the eormandr currently 
available in LABSRC3C in alphabetical order, briefly indicates 
their use, and serves as an index tu- this chapter. 

3.2 GO TO —Commands for Changing the Normal Program Flow 

The GO TO command in LAB3RG3C causes the program to branch 
to the question indicated in the command. The basic pattern of 
this command is GO TO, followed by the number of the question to 
which you want the program to skip, e.g,, 

go to qok 

If the GO TO command refers to a question number lees than the num- 
ber of the question currently displayed, the branch is referred to 
as a backward branch * Otherwise, it is a forward b ranch , 

A f orwar d bran ch enables you to skip over some questions. 

The system will then automatically substitute it* own answers for 
the questions that you have skipped even if yo u previously 

specified other answers . 

These internally supplied answers are called default options , 
They are listed in Table 3* 



- 13 - 



Ji. 



s 



Table 2 : Index to the Commands 



o 

f — i 
CO 
CO 
» 
o 
CO 



I — I 
CO 

co 



PL, 

cn 



co 

co 



<3 

o 

n 

w 



a 

Pm 

co 

i— i 
ft 



CM 

ft 

CO 


CM 

ft" 

CO 


CM 

ft* 

CO 


CO 


CO 


CQ 


s 


i 


u 


n 


o 


§jj 




o 


ft 


o 


ft 


o 


o 




o 


ft 


ft 


ft 




>H 


H 


s 


a 




PH 


p* 


PH 


CO 


CO 


CO 


M 


ft 


ft 


ft 


ft 


ft 



O J 
CO 



o 


O 


o 


ft MD 


trs 


< ft* 


* 


ft * 




PL, * 


EH 


CO Of 


O 


ft m 


* 


ft * 



OJ 

CO 



CM 



ft- 

CO 



o 

< 



EH 

ft 

C5 



pg 

O 



CM 

O 

Of 



8 



G 

0 

ft 



G 

o 

Cm 

m 

0 

i — I 

■8 

-P 

G 

O 

d 

ft 

a 

o 

m 

CD 

d 



a; 0 

a) ^ 

ED ft 

o g 

EH »H 



ft 

CD 

I 

0 ) 

G 



G 

d 

PM 



G 

o 

ft 

OJ * 
rH +3 

,£3 CD 

c5 0 
43 & 
g £ 

ft 

-lj (D 

d ^ 
ft ft 
a 

o G 
cn ft 

CD 

rt e 

0J CD 

S 43 

G 

0 d 



CD 

O 

a 

ft 

G 

0 



a 

o 

ft 

ft 

a 

ft 

0 

CD 

ft 

0 

> 



<D 

ED 



O 

EH 



CM 



CD 
0 
G 
O 
0 ) © 
ft CD 

G 

ft 0 

<D O 

* § 

0 > 

ft rH 
0 

0 Pi 
0 

CD ,G 
4 ^ 
O ft 

eh ^ 



* 




CD 


EH CD 




r> 






CD 




CD 


ft ft 




EH 






O 




Pi 


w* ft 




O 






G 




O 


•H 














O 


G ci 




CD 


m 




ft 




CD 


3 aj 




ft 


0 


Pi 


a 






ft ft 




G 


Pi 


O 


0 




0) 


ft CD 




CD 


O 


ft 






O 






S 


CJ 




p 




G 


Pi d 




3 


CD 


-P 


o 




3 


(D 




a 




CQ 


o 




> 


ft ^ 




o 


0 


0 


ft 




(D 


aj O’ 




ft 


a 


d 






ft 


CD H 






d 


cH 


ft 




© 


G *— 




ft 


3 


0 


o 




G 


W 




o 


i> 


G 








Q 






0) 




Pi 




ft 


CD ft 




Pi 


( — 1 


ft 


CD 




ft 


<D 




CD 


0 


a 


ft 




ft 


Pi ft 




ft 


u 






• 


£ 


O aj 








CO 


H 


CD 




a d 




3 


ft 


0 


G 


<D 


« 


CD cH 




G 


0 


0 




P. 


CD 


<D 






•rH 




ad 


O 


O 


ft 




ft 


ft 


d 


0) 


a 


d 


ft Pi 




0 


*rH 


a5 


■H 


CD 




ft © 


• 


ft 


a 


0 


ft 




ft 


s 


a; 


ft 


0 


1 — 1 


ft 


0) 


d 




Pi 


ft 


ft 


© 


O 


a 


u 


m Eh 


O 


a 


CD 


i o 


CD 


d 




ft ft 


O 


0 




PQ 


ft 


3 


p 


G — - 


CD 


ft 


aj 




CD 


> 


o 


ED 




CD 








0) 


o 


i S 


0 




G? 


a) 


d 


ft 


ft 


3 3 


U 


aj 


ft 


ft 



-P 

0 *h 




ft 


ft 


• 


0 


ft P> 


ft 


*h 


ft o 


O 


ft 


G 


G 


ft 


M 


d 


o 


ft G 


0 


0 


H O 


ID 


Pi 


|H 




m 


CO G 


=P 


d 


3 © 


d 



<D 

0 Pi 
OJ 

CD ,i1 

ft 
O ft 
EH £ 



o 
0 O 
0 ft 

CD 

G 

o o 

EH ft 



43 £ 
CD Q) 
ED ft 

cu 0 
ft Pi 



0 Pi 
ED O 
CD 

© EH 
ft 



VO 



ED G 

ft *rl 

ft ft 

O S 

EH 0 




CM 



a 

o 

ft 

ft 

CD 

CD 

& 

ft 

a 

CD 

Pi 

0 

ft 

ft 

ft 

ft 

cd 

O 

ft 

ft 

o 

9 

Pi 

ft 

O 

EH 



3 

ix, 

CO 



■ 8 . 

-rH 

a 

O 

CD 

ED 

< 



CD 

0J 

rH 

8 

ft 



I fH (Q 
g 0 0 
3 G G 
G © 
ft © 
ft ft CD 
G ft 
<D ^ 



Pi 

CD O 
O O 



a 

o 
o ft 



CD 



ED 

Pi 

S £ ft 

(D ft 



I 

Q 

0> 

©l 

ED 

a 

0 

0 O 

ft ft 

CD 

CD -H 
cd ft 
ft ft 



ft 

ft 



ft 




-it« 



ts 

Q 

0 



0 

id 



o 

o 

a) 

ft 

+3 

o 

ft 

H 

0 

ft 

a 



CS4 

0 

ft 

■3 

Eh 



CQ 

CD 

ft 

O 

CQ 



H 

ft 



on 

_=r 

on 



04 

an 



eg 



eg 



8 ft 

eg 

i — I 

o o 
o ^ 

ft - 

£ B 

W W 

M t — I 

PS PS 
EH EH 
M H 
S PS 



CQ 


EH 


s 


i 


g 


o 


p 


© 


O 


ft 


o 

ft 


ft- 



B § 

W LTV 
H 

PS * 
EH EH 

w ft 
PS * 



pq o 
> o 
h eg 

ft * 
PS * 
Eh Eh 

Sift 



m 

ft 

a 

0 



a 

■3 

ft 

0 

J» 

0 

*H 

P 

-P 

0 

P 



3 

P 

a 

V 



L0 

-P 

0 

cd 

P 

ft 

in 

■8 

0 

0 

0 

o 

EH 







P 






** — ■* 


1 


ft 


aj 




1 


P of 


d 


0 


rQ 




0 


a> m 


o 


> 






P 


ft " 


o 


0 


fl 






cd 


ft 




fS 




0 


0 Q 




p 






ft 


P P 


P 


P 


ft 




P 


no 


cd 


a) 


0 






H 


rH 


p 


•rH 


* 


0 


in cd 


0 




V 


P 


■H 


0 g. 


O 


fi 


•H 


0 




P cd 


vH 


0 


O 


0 


0 


O 0 



PS 



O 



0 

ft 

s 

ft 

ft * 
*H ft 
u 



V -p 

3 9 

P 



0 

m 



0 

Pi ft 
0 0 
F* 

cd 0 
•H 
Vi P 
O P 
0 

0 P 
ft 

0 0 
cd ft 
p p 
p 



p 

ss 

0 



0 

o 

ft 

P 

O 

V 



0 ft 

P P 



P 

Q 

ss 

d 'g 
& 

§ V 
ft v 
p o 



0 


o 


p 


0 


6 


a 




ft 




PI 


ft 


o 


cd 




cd 


p 


0 


0 


p 


P 






0 


P 




V 


P 


P 


0 


ft 


P 


0 




0 


0 


ft 


p 


3 


ft 


0 


ft 


0 


P 


0 


o 


P 


ft 


cd 






ft 






Pi 




ft 


0 




0 


0 


0 


0 


0 


0 




ft 


0 


E] 


0 




0 


p 


p 


0 


Jj 


0 


0 




d 






a 




■rH 


o 


0 


a 


O 


o 


O 


P 


EH 


s 


■H 


n 


ft 


§H 


ft 


eg 






on 




ft- 





t* 

a 

ft 

• 0 



Eh V 
O ft 
a 
0 

P ft 
0 0 
X! . 

P cd 



V 


r* 




o 


EH 






ft 




p 






0 






1 






a 


p 

0 


0 


ft 


0 


P 


0 




O 


■rH 


ft 


0 


V 


0 


0 


*rH 


> 




a 


0 


0 


0 


♦rH 


0 


ft 


P 


s 


0 


P 


E 




0 


r* 


cd 


P 


0 






ft 


P 


0 


0 


O 


ft 


P 


V 


P 


ft 


0 


s 


0 


P 


•H 


■rH 


a 




V 


m 


0 


•H 


p 


P 


0 


p 


0 


0 


0 


0 


ft 


-s 


| 


0 




0 


cd 


0 


o 




0 


ft 


<3? 


0 


0 


ft 


o 


ft 


P 


EH 


P 


O 



W 



cu 

eu 

on 



o 

PS 

CQ 



P 

0 

1 

0 

P 

0 ^ 

58 

I 

V rH 

D O 
O 



0 




P 


O 


0 


P 


P 


0 


gl 


P 


do 


0 


P 


£ 


cd 


0 



p 

p ft 



0 

p 



a 



0 

0 

m 



8 

S 



eg 

on 

on 



eg 

an 

an 



a 

os 

o 

CQ 



S 



a 

o 

CQ 



O 

CQ 



0 

P 

O 

O 

0 




ft 


ft 


■H 


ft 


E 


cd 


£ 


P 


cd 






«~ -*• 


P 




bfl 


P 




0 


P 


O 


W 


P 


0 


S 




0 






ft 


-3 




■e 


0 

ft 


ft 

a 


P 


a 


0 


p 


0 


O 


0 


0 


0 


0 


0 


o 




o 


0 


0 


0 


o 


0 


0 


P 


cd 


EH 


P 


ft 



O 

CQ 



0 




p 


P 


0 


0 


0 


o 


ft 




fS 


P 




3 
p ‘ 


0 


a 


0 


^ ' 


ft 


£ 



o 

ERIC 



■15 T 

2 l> 



Table 3* Default Options 



Question No , 



Default Options 



Q01 - DO YOU WMT WORD ASSOC I AT I OR? 

Q02 * PLEASE SPECIFY ASSOCIATION FILE, 
QOS - DO YOU WANT SCORING? 

QOU - ENTER BOOLEAN EXPRESSION 

Q05 - DO YOU WANT RESULTS DISPLAYED? 
0,06 - SPECIFY RESTART OR EXIT. 



yes 

KUHNSG 

yes 

previous Boolean 
expression 

no 

exit 



Say, for example, that in response to 



Q01 



DO YOU WANT WORD ASSOCIATION 



you type 



GO TO Q Ok. 

The system will internally establish the answers YES to QGl, KUHNSG 
to Q02, and YES to QQ3, and the next message to be displayed will be 

QOU ENTER BOOLEAN EXPRESSION* 

QOU itself can not have a default option unless a Boolean 
expression has already been entered on a previous pass through 
the program; therefore, it makes no sense to execute a forward 
branch that skips over Q0^+ when you have just begun to run a search 
on LABSRC3C . Nevertheless, the default option for QOU is one of 
the most powerful conveniences of the program because, once the 
request is submitted, you can change the association file and repeat 
the search without having to retype the request. 



A backward branch is used for modifying a previous input and, 
therefore, will most likely be used in reply to Q 06 . 

For instance, perhaps you have retrieved a set of documents 
by having the system search using the KUHNSG file’ and now would 
like to repeat the search on the KUHNSW file* After the system has 
displayed the results in the KUHNSG search, the next step in the 
program will be 

Q06 SPECIFY RESTART OR EXIT, 

To this you reply 



GO TO Q02 



and the system will respond with 



Q02 



SPECIFY ASSOCIATION FILE, 



Your response will be 

KUHNSW 



and the system will proceed to 

Q03 DO YOU WANT SCORING? 

But the default option for this question is YES, and furthermore, 
you do not need to resubmit your Boolean expression. So instead 
of typing YES to Q03, which will cause the program to advance to 
QOU and force you to retype your request, you type the command 



GO TO Q05 

ana the system will now re-execute the search using your new associ- 
ation measure. 

Note; There is an important difference between the inputs GO TO 
Q01 and RESTART. RESTART is a valid response to Q06 only , whereas 
GO TO QQ1 may be sent in response to any question except QOit, 

3,3 EDIT and SHOW — Commands for Modifying and Displaying the 

Boolean Request Expression and the Other Request Specifications 

3.3.1 EDIT 

This command enables you to change individual elements in a 
search request without having to retype the entire Boolean expres- 
sion; thus, it is normally used as a reply to QO 6 . In response 
to the EDIT command, your previously typed Boolean expression will 
reappear on the screen, prefaced by the invitation EDIT AND THEN 
SEND BLOCK. At this point you may 

• add or delete operands 

■ add or change Boolean operators 

• replace any or all of your original operands 
with new ones 

• add new weights 

• change weights 

These changes are made in the same way as error-correcting 
procedures for CRT inputs; i.e., by using the SPACE key to move 
the flashing cursor to positions on the terminal display screen 
corresponding to the characters you wish to delete, replace, or 
add. Once you have made the desired change in the Boolean expres- 
sion, you complete the step in the usual way by hitting SEND BLOCK, 
and the system will now search for documents that satisfy your 
new request. 



The EDIT command is especially useful for varying the elements of 
the Boolean expression one at a time to see what effect on 
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retrieval is produced "by each of these changes. For example, sup- 
pose you wish to coup are the set of document citations retrieved 
in response to the request 

('LINGUISTIC OR 'NATURAL LANGUAGE') ADTD 'ANALYSIS' 

with the retrieval that will be obtained if the index term language 
is substituted for natural language . Use of the command GO TO Q04 
will necessitate completely retyping the search request, just as 
if the previous one had never been submitted. Using EDIT instead, 
can preserve the useful elements of the previous request ■ Instead 
of retyping the entire string, simply delete the word NATURAL 
from the original search expression to obtain the new one, 

3,3,2 SHOW 

This command provides a status report on your request, showing 
the current answers to questions QOl to Q04. For the present, ig- 
nore the second "page" of this display. Its structure reflects 
distinctions which are not pertinent to present system implementa- 
tion. SHOW is especially useful after you have been putting the 
system throu gh a long series of request modifications and may 
not now he sure you remember all of the search specifications. 

3.U DISPLAY DOCUMENTS, SQRTA, SORTD — Commands for Sorting, 

Selecting, and Displaying the Set of Retrieved Document 

Accession Numbers and Relevance Scores 

3,4,1 DISPLAY DOCUMENTS 

When DISPLAY DOCUMENTS is entered, document accession numbers 
and relevance values will be displayed on the CRT's. To call up 
a list of the documents that satisfy your input expression, you 
type 

DISPLAY DOCUMENTS 

This command, therefore, has the same effect as a YES res- 
ponse to Q05, but may be entered at any point in the normal pro- 
gram flow. 

Its greatest advantage, however, is that it offers the oppor- 
tunity to limit the length of the list of retrieved documents, an 
especially welcome capability when the number of documents which 
satisfy the input expression is inconveniently large. In such 
an event, you may request a restricted display by specifying the 
number of citations you wish to examine; e.g. , 

DISPLAY 7 DOCUMENTS 



Alternatively, you may also restrict the size of this list 



by specifying a threshold relevance score or "cut-off " point:* * 



DISPLAY 10 DOCUMENTS *GT* .500 



In this case the system response will depend on the number 
of documents that satisfy the restrictions specified in your com- 
mand. If there are more than ten documents with relevance scores 
greater than .5? only ten will be displayed on the CRT’s; but 
if there are less than ten documents satisfying the threshold 
requirement , then only that smaller number of documents will be 
displayed, 

A third method for using DISPLAY to select portions of the 
file is to order the output using SORTP (Sec. 3.^.2) and then 
specify the number of documents desired. The use of these two 
commands will result in a display of the 5 documents with the 
highest relevance scores. Assume the input expression was 
( T AUTO , INDEXING' OR 'MANUAL INDEXING 1 ) AND 'INFO. RETRIEVAL', 

The computer responds with the number of documents that satisfy 
the expression and Q05 - DO YOU WANT RESULTS DISPLAYED? You then 
type SQRTD. The computer will respond with QQ5 again. If you 
then type DISPLAY 5 DOCUMENTS, the program displays the five 
with the highest relevance scores. 

3.4.2 SORTA and SORTD 

The SORTA and SORTD commands are used to sort the documents 
by their relevance scores. SORTA command sorts them in ascending 
order; i.e., the documents with lowest relevance scores are lis^ 
ted first; SORTD sorts in descending order, producing a ranked 
output with the documents with the highest relevance scores 
listed first. 

SORTA and SORTD are used only when scoring has been requested 
previously in response to Q03, and are usually entered in response 
to 

Q05 DO YOU WANT RESULTS DISPLAYED? 

SORTA and SORTD can also be used in combination with other 
commands such as DISPLAY to output selected portions of the set 
of retrieved documents as described in Sec. 3.^,1. 

Similarly, instead of following SORTD with the DISPLAY com- 
mand., you might prefer to type either 



The expression *GT* is the canonical abbreviation of the phrase 
"greater than." The command language also contains the expressions 

*EQ^ and *LT* , "equal to" and "less than." These alternatives may 
be used In place of *GT* In any command that has *GT* as an 
element of its basic pattern. 
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GET 10 DOCUMENTS (See Sec, 3.5.2) 



or 

RETRIEVE 10 DOCUMENTS (See Sec. 3.5.3) 

The use of SORTD in connection with GET and RETRIEVE is 
especially strategic , "because GET and RETRIEVE Doth result in 
the output of a rather large amount of data about each document, 
and it can be tiresome to read through a long series of such 
representations , 

3.5 GET, DISPLAY, and RETRIEVE — Commands for Displaying Data 
Files 

3.5.1 GET 

GET is used to display records from the MASTER! file, showing 
vdiich index terms have been assigned to each document . There 
are two forms of the GET command. One of them calls for display 
of the indexing of a particular document; the other indicates 
the number of MASTER I records representing documents retrieved by 
the current request which are desired. To retrieve the indexing 
for a particular document, enter GET plus the document number; 
for example , 

1 . GET ' B1218 ' , 

Note that the document number must consist of a letter fol- 
lowed by four digits, and be enclosed in single quotes. 

Fig, 2 shows the display that will be provided in response 
to this command. 



FIG. 2: MASTER! RECORD FOR DOCUMENT B1218 



B121801 

BI21802 

B121803 

B12180U 

B12160U 



AUTO. INDEXING 
EVALUATION 
PROBABILITY 
SEMANTIC 



CONNECTION 

INTRODUCTORY 

QUESTION-ANSWER 

STATISTICAL 



WORD ASSOCIATION 



DESCRIPTOR 

MATCH 

RECALL 

THESAURUS 



DOCUMENT 

MATCH 

RELEVANCE 

VOCABULARY 



Any MASTER I record at all may be called up by this form of 
-the GET command, regardless of whether it is a record for a docu- 
ment that satisfies a search request. In fact, you need not 
enter a Boolean expression at all in order to call for a dis- 
play of a record from the MA 8 TERI file , 
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The second form of the GET command does not require the specif i 
cation of document numbers , but simply indicates the number of 
MASTER! records to be displayed. However, it selects documents 
only from the retrieved set. To see the documents in the retrieved 
set enter 

2. GET DOCUMENTS 

If, on the other hand, you don’t want to see all the documents 
in the retrieved set, you can limit the number displayed. For 
example , 

3 „ GET 5 DOCUMENTS 

If the documents have been ordered by SGRTD , this command will 
retrieve the five documents with the highest relevance score; 
otherwise , it retrieves them in accession number order. 

3.5.2 DISPLAY 

The use of DISPLAY to display the set of retrieved document 
accession numbers and relevance scores is discussed in Section 
3*4,2, This section describes its use to display association 

tables , * 

Assume that you have entered the request 

(’AUTO, INDEXING 1 AND ’MANUAL INDEXING') OR ’RETRIEVAL* 

and you wish to know which terms are most highly associated with 
AUTO, INDEXING . To find out, you type; 

1, DISPLAY ’AUTO, INDEXING * 

The system will now respond by displaying the association table 
for AUTO, INDEXING, as shown in Fig. 3. 



FIG, 3: TYPICAL ASSOCIATION TABLE 



AUTO , INDEXING 9999 



CRITICAL 

BATCH PROCESSING 
SCOPE NOTE 
RELATIVE 



*9300 

.8800 

,4300 

.3586 



Association tables are defined in Sec. 1.6 





The table reveals that the term most highly associated with AUTO , 
INDEXING is ’CRITICAL, 1 and that the association value of the pair 
of terms f AUTO * INDEXING * and f CRITICAL 1 is ,9300, 

If you wish to see the association tables for each term in 
your request, it is not necessary to specify a DISPLAY command for 
each individual term in your Boolean expression. Instead, you 
simply type: 

2 . DISPLAY 

and the system will respond by displaying the association tables 
for each term in your Boolean egression, one at a time. When 
you are ready to have the next table displayed, hit SEND BLOCK. 

3.5.3 RETRIEVE 

RETRIEVE is used to display records from the MASTERA file, 
containing the author, title, and abstracts of the documents. 

There are four forms of this command. 

1, RETRIEVE DOCUMENTS 

The system will display the MASTERA records for all of the 
documents retrieved by the current search. 

2, RETRIEVE DOCUMENTS *GT* .600 

Here the system will display the abstracts of only those 

documents whose relevance number is greater than .600 (or what- 
ever figure you specify in your command) - ** 



3, RETRIEVE 5 DOCUMENTS 

In this ease the system will display the abstracts for the 
first 5 (or whatever figure you specify) documents retrieved in 
response to your request. 

Remember that a YES response to QQ5 causes the system to 
display document numbers in accession number order; hence the first 
5 documents in the retrieval output will not necessarily be the 
5 most relevant Items. It is possible, however, to obtain a 
ranked output by responding to QQ5 with a SORTD command Instead 
of YES (See Sec. 3.^.2), If this sort has been done, the 
RETRIEVE co mm and will display documents in order of decreasing 
relevance score. 



LT - (less than.) and EQ, - (equal to), may be substituted for GT 
in any command in which GT occurs , but remember that these ex- 
pressions will not be understood by the program unless they are 
enclosed In asterisks; e. g. , *LT* , *EQ* . 
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4 . 



RETRIEVE ' B12l8 1 



This command will have the system display the abstract of 1 
any particular document that you specify. The document number 
must be enclosed in single quotes and consist of one alphabetic 
character followed by four digits , The display that would be 
produced in response to this command is shown in Fig. 4 . 

The abstract of any document at all in the corpus may be 
called up by this form of the RETRIEVE command. The document 
need not be one of the ones retrieved in response to your search 
request , 

The abstracts used for the MASTERA file have been for the 
most part taken from standard abstracting journals , such as 
Computing Reviews or Documentation Abstracts , and the source for 
each of the abstracts is given on the last line of the MASTERA 
record. In the example shown in Fig. 4 , the word DOC on 
the last line signifies that the abstract that was printed as 
part of the document itself was the source for this particular 
MASTERA record. 
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FIG. 4: MASTERA RECORD FOR DOCUMENT B1218 
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SCORING 



4.1 General 



Typing YES response to Q03 Dl YOU WANT SCORING? causes 
LABSRC3C to compute a rele vanes s c - o ' e in the range ( 0 , l) for 
each document 5 which is one possible measure of the closeness 
of that document’s index term set to your input request. The 
relevance scores are computed from the association values of 
the terms assigned to the documents* The actual computational 
procedure depends on the kinds of Boolean operators in the search 
request. These procedures are discussed one at a time in the 
following sections, 

4,2 Requests with AND Operator 



For a search in associative retrieval mode in which all the 
request terms are connected by AND operators, LABSRC 3 C will re- 
trieve only those documents which have in their indexing a term 
from the association table of each operand in the original, request* 
The expanded request is the original request plus the terms from 
the association tables added in associative mode to the original 
terms (see Sec, 1.6). The relevance score for the document is 
computed by multiplying the association values of the terms 
assigned to the documents that correspond to the terms in the 
expanded request. If more than one term from the same associ- 
ation table is present In a document’s indexing and there is* 
therefore, more than one value to choose from for an operand , 
the highest value will be selected. 



Example : Suppose the request 

’ CLASSIFICATION 1 AND f CURRX CULUM 1 

were submitted, and word association using the KUHNSL file and 
scoring requested. Word association will expand the request 
terms as follows i 

Index Term Assoc, Value • Inde x Term Assoc. Value 



CLASSIFICATION 


,9999 


CURRICULUM 


.9999 


LATTICE 


,3191 


EDUCATION 


,6093 


CATEGORIES 


,2812 


PHILOSOPHY 


,5169 


CLUMP 


,2703 


INFO. SCIENCE 


.4496 


PREDICTION 


.2179 


INTERDISCIFLINAR ( Y ) 


.4439 


Among the documents 


retrieved 


in response to this request, we 





find 



A0049 5 "Librarianship and the Science of Information , ft 
by J,0. Donahue, American Documentation 17 
(July 1966 ), 120-3 with a relevance score of .996, 
and 




AQ121 



, "Information Science and Liberal Education" by B.F. 
Cheydleur, American Documentation , 1 6 (July 1965 ), 
171-7 with a relevance score of .170. 



AO 0 U 9 is indexed -under the following terms - 

CATALOGING CLASSIFICATION CURRICULUM EDUCATION 

INFO. SCIENCE LIBRARIAN PHILOSOPHY 

It was retrieved because both 'CLASSIFICATION' and 'CURRICULUM' 
are among its index terms , and these are precisely the terms 
specified by the request. From the table of association values, 
we see that 'CLASSIFICATION' and 'CURRICULUM' each have associ- 
ation values of .9999 with themselves. Since the terms were con- 
nected by the AND operator, the relevance score for A0049 win be 
computed by multiplying the association values: 

•9999 * .9999 a .99985 which, however, comes out as 

.996 due to founding properties of LABSRCSG's multiplication algor- 
ithm. Note that A0049 is also indexed under 'EDUCATION' and 
'INTERDISCIPLINARY' which are all among the terms of the expanded 
request, since they are highly associated with 'CURRICULUM*. When 
there is more than one term in the expanded operand that is al so 
in the indexing of a document, as mentioned above, LABSRC3C uses 
only the value for the term with the highest association value 
in performing the relevance computations. Since the association 
value of 'CURRICULUM* with itself ( .9999) is, of course, at least 
as high as its association with any other term, and 'CURRICULUM' 
is present in the document's indexing, ,9999 is used. 

Document A0121 is indexed under the following terms : 

CATEGORIES CENTERS CITATION INDEX CODING 



COMPUTER DATA DOCUMENT EDITING 

EDUCATION IDENTIFICATION INDEXING INFO. RETRIEVAL 

INFO. SCIENCE INTEND ISC IPLINAR ( Y ) INTERFACE LANGUAGE 

LOGIC MAN-MACHINE MATHEMATICS PROCESSING 

QUESTION-ANSWER SCOPE NOTE STORAGE STRUCTURE 

SYSTEM 



This document is not indexed under 'CLASSIFICATION', but it is_ in- 
dexed under 'CATEGORIES', which has an association of ,2812 with 
'CLASSIFICATION'. Similarly, we do not find the second term of 
the request, 'CURRICULUM', among the terms which index A0121, 
but we do find 'EDUCATION', 'INFO. SCIENCE' and 'INTERDISCIPLINARY', 
which have association values of .60935 .4496 and .4439 respec- 
tively with 'CURRICULUM' . Since the indexing for AO 121 contains 
at least one term which is highly associated with each of the 
request terms, this document was retrieved, even though there is 
no direct match between its index terms and those of the request. 

Its relevance score will be computed by multiplying the associa- 
tion value of the most closely associated terms, 'CATEGORIES' 
and 'EDUCATION', which are 



.2812 x .6093 = . 1713 ? which comes out as . 170 * hue to 
rounding procedures in the computer multiplication pro- 
gram. 

4*3 Requests with OR Operator 

If all request terms are connected hy OR operators, any docu- 
ment that has at least one of the request terms in its indexing 
will he retrieved. In searching in associative retrieval mode, 
the relevance score for the retrieved document will he computed 
as follows : 

1) Consider the terms in the expanded request which also 
appear in the indexing of the document in question, 

2) Select from these the term which has the highest associ- 
ation value with any of the original terms , 

3) Assign the association value of that term as the relevance 
score of the document* 

Example : Suppose the request is 



using the KUHNS L association measure. The association table for 
■ CLASSIFICATION ' is, of course, the same as shown in Section 4,2. 
The table for f CLASS IF . SCHEME 1 is 



Once again document A0121 will be retrieved. In this particular 
example there is a term in the indexing associated with both operands 
(although that is not necessary to retrieve the document), and, as 
frequently happens when the two operands are semantically as 
closely related as these two, it is the same term - CATEGORIES, 

CATEGORIES is associated with CLASSIFICATION with an associ- 
ation value of ,2812 and with f CLASSIF, SCHEME* with a value of 
.2380. Since the scoring algorithm selects the higher value, 
the relevance value of A0121 relative to this request expres- 
sion will he . 2812 . 

4.4 Requests with NOT Operator 

If a request term is preceded hy the NOT operator, then no 
document indexed under that term will he retrieved, no matter how 
many other terms it may be indexed under that correspond to the 
non-negated specifications of the request. Thus the only effect 
that the NOT operator has on scoring is that the presence of the 



* CLASSIFICATION* OR 'CLASSIF. SCHEME* 



CLASSIF. SCHEME 



. 9999 
,3012 
,2562 
,2380 
,2380 



MANUAL INDEXING 
STAT, METHOD 
TAG 

CATEGORIES 
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negated term in the indexing of a document sends that documents 
relevance score to zero, and the document is not retrieved, 

Negated terms are not expanded, even when searching in 
associative retrieval mode . There must he a direct match be- 
tween a documents index term and a negated request term in order 
for the document to be rejected by virtue of the presence of the 
WOT operator in the request. The presence of a term associated 
with a negated term does not disqualify the document. So, for 
example, if there is a request of the form *A f AND NOT 'B' , and 
there is a document which is not indexed under either A or B, 
but is indexed under a term which is associated with both A 
and B, that document will be retrieved and assigned as a rele- 
vance score equal to that term's association value with term A, 

Normally a term or any one of its associated terms will be 
considered in the search* Negating a term asks the system to 
check each document to make sure that the negated term does not 
appear; thus even though the term is negated, it is still in- 
eluded in the search. Therefore, when the association tables 
for a negated term are displayed, the term itself will not be 
followed by the asterisk which is used to indicate that a term 
is not currently being considered. The list of terms associated 
with the negated term, on the other hand, are not used to expand 
or restrict the request and thus are followed by asterisks in 
the association tables. (Note: if a term occurs in the list 

of a non-negated term as well as the negated one, it will be 
used to expand the non-negated term, ) 

Example : Consider a search in associative retrieval mode, car- 

ried out for the Boolean expression 

’RELEVANCE* AND 1 RECALL 1 AND NOT ’PRECISION* 



The association tables for these terms will be displayed on the 
CRT 1 s as follows : 



RELEVANCE 

RECALL 

MEASURE 

RELEV* JUDGMENT 
RANK 



recall 

PRECISION 

CRANFIELD 

EVALUATION 

FACETED CLASSIFICATION 



PRECISION 
RECALL * 
CRANFIELD * 
PERFORMANCE * 
RANK * 



This request is particularly interesting because one of the de- 
sired request terms, ’RECALL*, is highly associated with the 
undesired term, ’PRECISION*, and vice versa. 



In this case the NOT operator takes precedence over the 
association data in the search. This is not to say that the 
request term ’RECALL* is not expanded: it means that ’RECALL* 

itself is expanded but that any document retrieved which also 
has been assigned the term ’PRECISION* will be excluded from 
the search result. Thus a document indexed under ’RELEVANCE* 





\ 



and * PRECISION f will not te retrieved, even though ’PRECISION 1 is 
highly associated with the desired term ’RECALL’ * On the other 
hand, a document indexed under ’RELEVANCE’ and ’CRANFIELD 1 , hut 
not Indexed under ’PRECISION 1 , will he retrieved, since ’PRE- 
CISION ’ is not expanded. The fact that 1 CHANFIELD ’ is highly 
associated with the undesired term ’PRECISION’ is ignored, and 
the document is retrieved and given a relevance number in the 
manner described In Sec. 4. 2. 

4,5 Weighting 

4.5*1 The Effect of Weights on Relevance Scores 

A weight may be any three digit decimal number from .000 to 
,999 s and it is used to de^emphasize the term or the operand 
immediately following it in the Boolean expression. The weights 
may be applied to 

(l) single term operands; i,e,, 

.500* ’AUTO , INDEXING’ OR ’MANUAL INDEXING’ 

or ( 2 ) operands containing more than one term; i.e , 5 

.500 # (’AUTO. INDEXING’ AND ’MANUAL INDEXING’) 

OR ’ABSTRACTING’ 

or ( 3 ) terms within operands; i.e., 

( . 500* ’AUTO . INDEXING 1 OR ’MANUAL INDEXING’) 

AND ’ABSTRACTING’, 

An unweighted request term is automatically assigned a weight 
corresponding to its association value with itself.* When a 
different weight is explicitly assigned to a term in the Boolean 
expression, the association value of the term (as well as the 
association values of the four terms most highly associated with 
that term) is multiplied by the assigned weight. This operation 
will result in a lower relevance score for the documents that 
are retrieved because of the presence of this term in their 
indexing , 



Example : Consider the request 

’AUTO. INDEXING’ OR ’MANUAL INDEXING’, 



Most of the files 
itself to be . 999 . 
and KUHNSW files. 
Chapter 5. 



Interpret the association value of a term with 
This, however, is not true of KUHNSG, KUHNS S , 
For an explanation of this phenomenon, see 
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